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Abstract 

For a mixed quantum state with density matrix p there are in- 
finitely many ensembles of pure quantum states, which average to p. 
Starting from Laplace principle of insufficient reason (not to give a 
priori preference to any particular state), we derive a 'natural' distri- 
bution of pure states averaging to p, which is 'more spread' than all 
the others. 

1 Introduction 

In classical situation an unknown probability distribution can be estimated 
by collecting statistics. This is not the case in quantum mechanics when 
we need to estimate a distribution of pure quantum states. All we can do 
is to estimate the density matrix p of appropriate mixed quantum state. 
Although, there are infinitely many distributions of pure quantum states 
which average to the density matrix p. That is why in order to estimate 
the distribution, that is, the ensemble of pure quantum states, some a priori 
assumption about this distribution is required. Which one? 

It was Laplace who introduced the formula of classical probability [lj 

p= — 

n 



*Dcpt. of Mathematics, SPb EF University, Griboyedova 30-32, 191023, St. Petersburg, 
Russia 

^Dept. of Informatics, The State Russian Museum, Inenernaya, 4, 191186, 
St. Petersburg, Russia (corresponding author, email: Roman. Zapatrin at gmail.com) 



1 



where n stands for the total number of outcomes and m is the number of 
favorable ones. This formula is not ad hoc introduced. Rather, it based on 
the principle of insufficient reason: 

if there is no reason to prefer one outcome of an experiment with 
respect to another one, all outcomes are treated equally probable. 

According to Laplace, if we are given a completely unknown distribution and 
we need to estimate it, we assume it to be just uniform. But what should 
we do if we have an additional information about the distribution? Can we 
still use Laplace principle? 

2 A classical example: exploring biased die 

Suppose we play with die whose properties are not known. If we are asked 
what is the probability of a certain face to appear, we intuitively (but in fact 
according to Laplace) answer: 'there are 6 faces, none of them is preferred, 
hence any face appears with the same probability 1/6'. 

We roll N identical dice and, as a result of this experiment, we learn 
the mean value, denote it M, of the number of shown points. This is an 
information about the die, how it affects our estimation? In this case the 
hypothesis of the equality of all faces may no longer be compatible with 
initial hypothesis that all faces are the same (indeed, one would scarcely 
believe a die showing M = 5 points at average to be symmetric). So, the 
Laplace principle is not applicable, at least in its direct form. 

Consider it in a more general setting. Suppose we have N identical dice 
with K faces each. Each face k is labeled by a value Ak and the die might be 
'biased', that is, the probability of fc-th face to appear is an unknown number 
Pk- N such identical dice are rolled, and the average value of the number 
appeared turns out to be M. What can we now say about p^ 

This average value M can be obtained when we have n\ times face 1,. . . , 
riK times face K, with the values {rii, . . . , %} satisfying the equations 



ni H h n K — N 

A 1 -n 1 -\ h A K ■ n K = M ■ N 



(1) 



When the number N is large, we may treat 



Pk = N 



(2) 
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The solution of ([TJ) with respect to {rii, . . . ,nx} is, however, far from being 
unique. Meanwhile, the solutions do not possess equal rights: each particular 
solution {rii, . . . , rix}, according to Bernoulli formula, has a priori probabil- 
ity 

, Nl , pT---Pk k 

nil ■ ■ -riK ] - 

Maximizing the value of the probability P{n\, . . . nx), among the solutions 
{rii, . . . , rix}, satisfying ([TJ we find one, which has greatest probability, there- 
fore we . Using Stirling formula we get (see, e.g. [2] for details): 

i v AT ( , ni n K n K \ 

\ogP{n x ,...n K ) ~N- -jf log —J (3) 

The above formula is the Shannon entropy 

— log P(m, . . . n K ) ~ -Pi logpi p K \ogp K (4) 

and the maximum of logP(ra 1 , . . . n#) is attained at 

n k e~ pAk 

where Z is the normalizing factor 

Z = J2e~P Ak (6) 

k 

and f3 is a 'temperature parameter', obtained by solving ([TJ in explicit form 

A x e~^ + ■■■ + A K e~ pAK M 

e-^i + • • • + e-^K TV ^' 

with respect to (3. This gives us definite values of pi, . . . p/JE 

As an illustration, consider a usual die, that is K = 6, A\ = 1, . . . , Aq = 6. 
Begin with a symmetric case M — (l + 2 + -- --|-6)/6 = 3.5. The solution 
of (0) is (3 = 0, which means that the Laplace principle still works and this 
particular value of M gives no preference to any state, therefore the null 
hypothesis (the uniform distribution pj = 1/6) should not be rejected. 



1 Recall that we know only M and we wish to infer from this knowledge the 'natural' 
values of pi , . . . px ■ 
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If the die is 'biased', we obtain a different value o M, say, M = 2.5. In 
this case the Laplace principle should be expanded: namely, we search the 
distribution maximizing the entropy H = — ^2pj\ogpj. In our particular 
case this gives the following answer: 



Solving numerically (jTJ) for M — 2.5 gives us (3 = 0.3710, that is 



The main message of this section is the following. We provide a com- 
pletely classical example where we have no knowledge about the input state 
(distribution) but we need to tell something about it. A principle is described 
to choose a concrete distribution on the basis of a given small amount of 
knowledge. 

3 Continuous ensembles 

Why the idea to maximize the entropy if is a development of Laplace idea 
of symmetry and non-preference? For any given average value we consider 
all possible distributions which yield this average value. Then the take such 
distributions which are typical, that is, which mostly occur in all possible 
configurations [I]. The preference is given to what occur with maximal num- 
ber of combinations, expressed by the statistical weight 



where N is the total number of trials and nj is number of occurrence of j-th 
face (j-th outcome, more generally). 

Now let us develop a similar construction, but passing from numbers to 
operators, that is, the mean value is now an operator rather than a number 
M in (JT]). The restriction (TjQ) becomes of matrix form. The consequence of 
this is that the value of the parameter (3 — appropriate Lagrange multiple- 
becomes matrix as well. 

Let TC = C n be an n-dimensional Hermitian space, let p be a density 
matrix in TC. There are infinitely many ensembles whose average density 



{p 1 ,p 2 ,...,p 6 } 



{0.3476, 0.2396, 0.1654, 0.1143, 0.0788, 0.0543} 
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matrix is p. Among them we would like to emphasize a 'natural' one. First 
suppose this ensemble to be finite and, like in previous section, in order to find 
a natural ensemble, maximize its mixing entropy. The result is the following: 
given any arbitrary large number E, we can always find an ensemble of 
2 E pure states which averages to p and whose mixing entropy is E: this 
is a uniform ensemble [3J. So, there is no limit for mixing entropy for finite 
ensembles. As a result, we pass to continuous ensembles with the distribution 
density expressed by a function /i(0) where ranges over all unit vectors^) in 
H. 

The set of all self-adjoint operators in 7i = C n has a natural structure of 
a real space R 2n , in which the set of all density matrices is a hypersurface, 
which is the zero surface T = of the affine functional T = TrX — 1. The 
density operator of a continuous ensemble associated with the measure p(<p) 
on the set CB n of unit vectors in 7i is calculated as the following (matrix) 
integral 

p = J p(<j>) d^ (9) 

<f>eCB„ 

where |0)(0| is the projector onto the vector (0| and dtp is the above men- 
tioned normalized measure on CB n , that is, J dip = 1. Effectively, 

the operator integral p in (Q can be calculated by its matrix elements. In 
any fixed basis {|e^)} in Tl, each its matrix element = (ej|/)|e 3 -) is the 
following numerical integral: 

Pij = (ei|p|e 3 -) = J /i(0) (e 4 | 0) (0| e^) d^ (10) 

Kullback— Leibler distance. We quantify the state preparation efforts by 
the difference between the entropy of uniform distribution (that is, our null 

2 Pure states form a projective space rather than the unit sphere in Ti. On the other 
hand, one may integrate over any probabilistic space. Usually distributions of pure states 
over the spectrum of observables are studied, sometimes probability distributions on the 
projective spaces are considered [5]. In this paper for technical reasons we prefer to 
represent ensembles of pure states by measures on unit vectors in Ti. We use the Umegaki 
measure on CB n — the uniform measure with respect to the action of U(n) normalized so 
that J CB dip = 1. 
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hypothesis) and the entropy of the ensembl^l in question. This is equal to 
Kullback-Leibler distance [2] 



/Lt(X) 
p(x) In — — dx 
Ho{x) 



between the distribution p(x) and the uniform distribution po(x) with con- 
stant density, normalize the counting measure dx on the probability space so 
that Ho — 1. This distance is the average likelihood ratio, on which the choice 
of statistical hypothesis is based. Then, in order to minimize the Type I error 
we have to choose a hypothesis with the smallest average likelihood ratio. 

Maximizing the entropy. The problem reduces to the following. For 
given density matrix p find a continuous ensemble p having minimal differ- 
ential entropy: 

S(p) = J p(x)lnp(x)dx — > min, J p{ip)(hp = p (11) 

where is the unitary invariant measure on pure states normalized to 
integrate to unity. When there is no constraints in ffTTj) . the answer is 
straightforward — the minimum (equal to zero) is attained on uniform distribution- 
this situation is quite similar to the symmetric classical case considered in 
section [2j To solve the problem with constraints, we use the Lagrange mul- 
tiples method [7]. The appropriate Lagrange function reads: 

C(p) = S(p) - Tr A fj MM - 

where the Lagrange multiple A is a matrix since the constraints in ffTT]) are 
of matrix character. Substituting the expression for S(p) and making the 
derivative of £ over p zero, we get 

-TrBIVXV-l 

= ~zw (12) 

where B is the optimal value of the Lagrange multiple A which we derive 
from the constraint (1111) and the normalizing multiple 

Z(B) = [ e- TrB|Wl d</> (13) 



3 We are speaking here of mixing entropy 6J of the ensemble rather than about von 
Neumann entropy of its density matrix. 
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is the partition function for (j!2p . 



4 Conclusions 

In Classical Mechanics, there is a unique correspondence between mixed 
states and ensembles of pure states. This is no longer the case in quan- 
tum mechanics: if we are give a state described by a density matrix p, there 
are infinitely many ensembles of pure quantum states, which average to p. In 
our paper we consider quantum systems with finite-dimensional state space 
Ti = C™. An ensemble of pure states is understood in a mostly general sense 
as certain distribution of pure states in H, rather than a finite weighted sum. 

Contrary to conventional approach, we exploit continuous distributions 
of state vectors (but still in finite dimensions C n ). The task we tackle is the 
following, Suppose we are given a quantum state with a density matrix p, and 
this is all we know about the preparation procedure. In this setting, what 
could we say about the ensemble, which gave rise to p? In order to answer 
this question, we use standard statistical approach: among all ensembles 
averaging to p we choose the one which is more spread than the others. 
What means more spread? 

We consider all ensembles averaging to p and, according to Laplace, give 
no preference to any of them. As stated above, by ensemble we mean a 
distribution and as a zero hypothesis we take it to be uniform no knowledge, 
no preference. However, when p ^ 1, this hypothesis is not compatible with 
our knowledge that the average quantum state is p. In order to comply with 
Laplace principle [I] we choose the distribution of pure states which 

• averages to the state p 

• has the greatest differential entropy 

The resulting distribution has the form (fTTj) : S(p) = J p(4>) ln/z(0) d(p — > 
min. So, summarizing our result 

if we have a source of particles whose average quantum state is p, 
and this is the only information about the source of this particles, 
we have to state that they are prepared as follows: pure states 
are emitted with probability density 
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The existence of lazy ensembles 



Let p be a nondegenerate density operator, that is, < Ai < ■ • • < A n , where 
{Xj} are the eigenvalues of the operator p. We are going to prove that for 
any such p there exists an operator Y such that 



>|Y|. 



p (14) 



From now on all the integrals are taken over the unit sphere in C n (if oth- 
erwise not written explicitly), and d0 is the invariant measure on it induced 
by Lebesgue measure in R 2n . Fix a density matrix p and consider the scalar- 
valued function 

Zoo = J e<*l Y W d0 - Tr (Yp) (15) 

and search for its minimum. If the minimum exists, then then the necessary 
condition for it is vanishing of all partial derivatives 

dY 

which is equivalent to (fT4l) . that is, if the minimal value of ffl~5l) exists, then 
Y, at which it is attained, yields the solution for ([Ml) . So, we only have to 
prove the existence of the minimum of the function (fl~5"]) . Consider a 
sequence of functions Zm parameterized by integer M: 



Z M {Y) = J ^l + ^_JZLj d - Tr (Yp) 

for them, the existence of minimum is straightforward since Zm is a mixture 
of positive concave functions. That means, the (operator) equation 

always has a solution Y (this Y depends on M). Let us evaluate the eigen- 
values of Y, denote them y\ < ■ ■ ■ < y n appropriately ordered. That is, for 
any 

yi<^\Y\4>)<y n 
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hence 



1 + 



—) 

2MJ 



2A/-1 



< 



< 1 + 



Y 



< 1 + 



2M 

Vn \ 

2MJ 



2M-1 



< 



2M-1 



d0 



taking the trace we get 
\ + 



Since we are dealing with a density operator p, its trace equals 1, therefore 

yi<o<y n (17) 

Now let us evaluate the difference y n — y\ between the greatest and the 
least eigenvalues of Y from ffToT) . Let \ej) be an eigenvector of p associated 
with the eigenvalue Xj, then ffTB]) reads: 



A, 



1 + 



2/1*1 H h Vn-ltn-l + 2/n(l ~ *1 



t-n-l; 



2M 



2M-1 



t 7 - dti • • • dt n _ 



ti+-+t„_i<i 
therefore (y n -yi)Xi 

1— 12 t n -l 

dt 2 • ■ ■ dt n _i 

i 2 +-+*n-l<l ti=0 



2M-1 



f (, , (yi-yn)ti + y n + Ek=2(yk-yn)tk \ , , 

7 1 + 2M txCy.-yx) dt 



dt 2 • • • dt n _i 

*2+-+*n-l<l 

1— ti tn-1 



tn-1 \ 2M 

1 . (yi-yn)ti + y n + Ek=2(yk-yn)tk \ , 
1 + 2M ) tx 



1— ta t n _i 



ti=0 



2 , 1 \ 2M 

f L {yi - Vn)h + y n + EL 2 (^ - Vn)tk \ dt 

7\ 2M I 1 



ti=0 
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The first summand in the above expression is minus an integral of a non- 
negative function, the second is the following integral over the unit sphere 



/ 



Ml- 1/ 



i + fa-yn)ti + yn + E£ 2 (y*-yn)t fc dti dta . . . dtni 

2M 



tl+-+*n-l<l 

denote it by K and rewrite in a more familiar form: 



ti+-+t n =i 

then 

(2/n-2/i)Ai < K (18) 

Using (USD and taking into account that (l + ^J^) 2 1 (l + 
have 



2M / ' 



K = Trp + / 1 + -^ — — —d6 < 

y J \ 2M J 2M v ~ 

^ rp . rp yn — yi 1 . Yn yi 

< Irp + lr p = 1H 

since y n — Hi < (0| Y Therefore 

(yn-yiJA! < # < 1 + ^^ 

so (y n — yi) (Ai — ^j) < 1. That is, for any M > Ai we have Ai — ^ > g > 
therefore 

(y„-yi)<2/A 1 (19) 

This means that for sufficiently big M the solutions {yi, . . . ,y n } of (CHI) 
remain in the compact set: 

Vi < < y n 

{y n - yi) < 2/Ai 

Therefore the limit of the solutions exist which means that for any nondegen- 
erate density operator p there always exists the appropriate lazy ensemble 

■IB 



e -wum d(j) = p 
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